Structural speaker adaptation using maximum a posteriori approach and a Gaussian distributions merging technique

نویسندگان

Olivier Bellot

Driss Matrouf

Pascal Nocera

Georges Linarès

Jean-François Bonastre

چکیده

The aim of speaker adaptation techniques is to enhance the speaker-independent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. Recently, a technique based on hierarchical structure and the maximum a posteriori criterion was proposed (SMAP). In this paper, like in SMAP, we assume that the acoustic models parameters are organized in a tree containing all the Gaussian distributions. Each node in that tree represents a cluster of Gaussian distributions sharing a common affine transformation representing the mismatch between training and test conditions. To estimate this affine transformation, we propose a new technique based on merging Gaussians and the standard MAP adaptation. This new technique is very fast and allows a good unsupervised adaptation for both means and variances even with small amount adaptation data. This adaptation strategy has shown a significant performance improvement in a large vocabulary speech recognition task, alone and combined with the MLLR adaptation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structural linear model-space transformations for speaker adaptation

Within the framework of speaker-adaptation, a technique based on tree structure and the maximum a posteriori criterion was proposed (SMAP). In SMAP, the parameters estimation, at each node in the tree is based on the assumption that the mismatch between the training and adaptation data is a Gaussian PDF which parameters are estimated by using the Maximum Likelihood criterion. To avoid poor tran...

متن کامل

Speaker normalization and adaptation based on linear transformation

We propose novel speaker independent (SI) modeling and speaker adaptation based on a linear transformation. An SI model and speaker dependent (SD) models are usually generated using the same preprocessing of acoustic data. This straightforward preprocessing causes a serious problem. Probability distributions of the SI models become broad and the SI models do not give good initial estimates for ...

متن کامل

Improved speaker verification through probabilistic subspace adaptation

In this paper we propose a new adaptation technique for improved text-independent speaker verification with limited amounts of training data using Gaussian mixture models (GMMs). The technique, referred to as probabilistic subspace adaptation (PSA), employs a probabilistic subspace description of how a client’s parametric representation (i.e. GMM) is allowed to vary. Our technique is compared t...

متن کامل

Discriminative Transformation for Sufficient Adaptation in Text-Independent Speaker Verification

In conventional Gaussian Mixture Model – Universal Background Model (GMM-UBM) text-independent speaker verification applications, the discriminability between speaker models and the universal background model (UBM) is crucial to system’s performance. In this paper, we present a method based on heteroscedastic linear discriminant analysis (HLDA) that can enhance the discriminability between spea...

متن کامل